Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference
نویسندگان
چکیده
We investigate how to assess the prosody quality of an ESL learner’s spoken sentence against native speaker’s natural recording or TTS synthesized voice. A spoken English utterance read by an ESL leaner is compared with the recording of a native speaker, or TTS voice. The corresponding F0 contours (with voicings) and breaks are compared at the mapped syllable level via a DTW. The correlations between the prosody patterns of learner and native speaker (or TTS voice) of the same sentence are computed after the speech rates and F0 distributions between speakers are equalized. Based upon collected native and non-native speakers’ databases and correlation coefficients, we use Gaussian mixtures to model them as continuous distributions for training a two-class (native vs non-native) neural net classifier. We found that classification accuracy between using native speaker’s and TTS reference is close, i.e., 91.2% vs 88.1%. To assess the prosody proficiency of an ESL learner with one sentence input, the prosody patterns of our high quality TTS is almost as effective as those of native speakers’ recordings, which are more expensive and inconvenient to collect.
منابع مشابه
Aperiodicity Analysis for Quality Estimation of Text-to-Speech Signals
This contribution presents a new approach towards nonintrusive quality assessment of text-to-speech (TTS) signals. Perturbation measures which capture the degree of excitationspecific aperiodicity in voiced speech are investigated concerning their quality implications in synthesized speech. Based on two independent TTS databases for which formal attributebased listening tests have been conducte...
متن کاملAdapting Prosody in a Text-to-Speech System
The requirements of the evolving information communication technologies (ICT) place new demands on text-to-speech (TTS) systems. The modern high quality TTS system has to be capable of fast and high-quality adaptation to a new language, voice or even expressive speech. Thus adaptation to new voices with different prosodic characteristics is desired. In this chapter a survey of recent and past a...
متن کاملImproving intelligibility of synthesized speech in noise with emphasized prosody
The performance of current high quality concatenative text-to-speech (TTS) systems is limited under noisy environments. This paper investigates whether or not the intelligibility of synthesized speech in noise can be improved by emphasizing the prosody. Additionally, the paper presents a method that can effectively emphasize the prosody of units in existing TTS databases. The circular linear pr...
متن کاملPerformance Analysis of Text To Speech Synthesis System Using HMM And Prosody Features With Parsing For Tamil Language
This paper describes a Hidden Markov Model (HMM) based (TTS) system and prosody based (TTS) system for producing natural sounding synthetic speech in Tamil language. The (HMM) based system consists of two phases such as training and synthesis. Tamil speech is first parameterized into spectral and excitation features using Glottal Inverse Filtering (GIF). An emotions present in the input text is...
متن کاملThe New Slovenian Text-to-Speech System
Human-computer interaction in a natural language is becoming possible due to rapid development of computer power. While text-to-speech (TTS) systems for major world languages are quite advanced, smaller languages, like our Slovenian language, lack quality TTS synthesis. At the "Jozef Stefan" Institute a system called GOVOREC (SPEAKER) has been developed which is capable of automatic conversion ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017